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Foreword 



rd , 



This Technical Specification has been produced by the 3 Generation Partnership Project (3GPP). 

The present document specifies the Voice Activity Detector (VAD) to be used in the Discontinuous Transmission 
(DTX) for the digital cellular telecommunications system. 

Archive en_300965v080000p0.zip which accompanies the present document, contains test sequences, as described in 
clause A. 2. 

en_300965v080000p0.zip Annex B: Test sequences for the GSM Full Rate speech codec; Test sequences files 

*.inp, *.cod, *.vad. 

The specification from which the present document has been derived was originally based on CEPT documentation, 
hence the presentation of the present document may not be entirely in accordance with the ETSI/PNE Rules. 

The contents of the present document are subject to continuing work within the TSG and may change following formal 
TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an 
identifying change of release date and an increase in version number as follows: 

Version x.y.z 

where: 

X the first digit: 

1 presented to TSG for information; 

2 presented to TSG for approval; 

3 or greater indicates TSG approved document under change control. 

y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, 
updates, etc. 

z the third digit is incremented when editorial only changes have been incorporated in the document. 
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Scope 



The present document specifies the Voice Activity Detector (VAD) to be used in the Discontinuous Transmission 
(DTX) as described in GSM 06.31. It also specifies the test methods to be used to verify that a VAD complies with the 
technical specification. 

The requirements are mandatory on any VAD to be used either in the GSM Mobile Stations (MS)s or Base Station 
Systems (BSS)s. 



References 



The following documents contain provisions which, through reference in this text, constitute provisions of the present 
document. 

• References are either specific (identified by date of publication, edition number, version number, etc.) or 
non-specific. 

• For a specific reference, subsequent revisions do not apply. 

• For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including 
a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same 
Release as the present document. 

[1] GSM 01.04: "Digital cellular telecommunications system (Phase 2+); Abbreviations and 

acronyms". 

[2] GSM 06.10: "Digital cellular telecommunications system(Phase 2+); Full rate speech; 

Transcoding". 

[3] GSM 06.12: "Digital cellular telecommunications system(Phase 2+); Full rate speech; Comfort 

noise aspect for full rate speech traffic channels". 

[4] GSM 06.31: "Digital cellular telecommunications system(Phase 2+); Full rate speech; 

Discontinuous Transmission (DTX) for full rate speech traffic channels". 



3 Abbreviations 

Abbreviations used in the present document are listed in GSM 01.04 [1]. 



General 



The function of the VAD is to indicate whether each 20 ms frame produced by the speech encoder contains speech or 
not. The output is a binary flag which is used by the TX DTX handler defined in GSM 06.31 [4]. 

The ETS is organized as follows. 

Clause 2 describes the principles of operation of the VAD. 

In clause 3, the computational details necessary for the fixed point implementation of the VAD algorithm are given. 
This clause uses the same notation as used for computational details in GSM 06.10. 

The verification of the VAD is based on the use of digital test sequences. Clause 4 defines the input and output signals 
and the test configuration, whereas the detailed description of the test sequences is contained in clause A.2. 

The performance of the VAD algorithm is characterized by the amount of audible speech clipping it introduces and the 
percentage activity it indicates. These characteristics for the VAD defined in the present document have been 
established by extensive testing under a wide range of operating conditions. The results are summarized in clause A. 3. 
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Functional description 



The purpose of this clause is to give the reader an understanding of the principles of operation of the VAD, whereas the 
detailed description is given in clause 3. In case of discrepancy between the two descriptions, the detailed description of 
clause 3 shall prevail. 

In the following subclauses of clause 2, a Pascal programming type of notation has been used to describe the algorithm. 

5.1 Overview and principles of operation 

The function of the VAD is to distinguish between noise with speech present and noise without speech present. The 
biggest difficulty for detecting speech in a mobile environment is the very low speech/noise ratios which are often 
encountered. The accuracy of the VAD is improved by using filtering to increase the speech/noise ratio before the 
decision is made. 

For a mobile environment, the worst speech/noise ratios are encountered in moving vehicles. It has been found that the 
noise is relatively stationary for quite long periods in a mobile environment. It is therefore possible to use an adaptive 
filter with coefficients obtained during noise, to remove much of the vehicle noise. 

The VAD is basically an energy detector. The energy of the filtered signal is compared with a threshold; speech is 
indicated whenever the threshold is exceeded. 

The noise encountered in mobile environments may be constantly changing in level. The spectrum of the noise can also 
change, and varies greatly over different vehicles. Because of these changes the VAD threshold and adaptive filter 
coefficients must be constantly adapted. To give reliable detection the threshold must be sufficiently above the noise 
level to avoid noise being identified as speech but not so far above it that low level parts of speech are identified as 
noise. The threshold and the adaptive filter coefficients are only updated when speech is not present. It is, of course, 
potentially dangerous for a VAD to update these values on the basis of its own decision. This adaptation therefore only 
occurs when the signal seems stationary in the frequency domain but does not have the pitch component inherent in 
voiced speech. A tone detector is also used to prevent adaptation during information tones. 

A further mechanism is used to ensure that low level noise (which is often not stationary over long periods) is not 
detected as speech. Here, an additional fixed threshold is used. 

A VAD hangover period is used to eliminate mid-burst clipping of low level speech. Hangover is only added to 
speech-bursts which exceed a certain duration to avoid extending noise spikes. 



5.2 Algorithm description 



The block diagram of the VAD algorithm is shown in figure 2. 1 . The individual blocks are described in the following 
subclauses. ACF, N and sof are calculated in the speech encoder. 
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Figure 2.1 : Functional block diagram of the VAD 

The global variables shown in the block diagram are described as follows: 

ACF are auto -correlation coefficients which are calculated in the speech encoder defined in GSM 06.10 
(subclause 3.1.4, see also clause A. 1). The inputs to the speech encoder are 16 bit 2's complement numbers, as 
described in GSM 06.10, subclause 4.2.0; 

avO and avl are averaged ACF vectors; 

ravl are autocorrelated predictor values obtained from avl; 

- rvad are the autocorrelated predictor values of the adaptive filter; 

N is the long term predictor lag value which is obtained every sub-segment in the speech coder defined in 
GSM 06.10; 

ptch indicates whether the signal has a steady periodic component; 

sof is the offset compensated signal frame obtained in the speech coder defined in GSM 06. 10; 

pvad is the energy in the current frame of the input signal after filtering; 

thvad is an adaptive threshold; 

Stat indicates spectral stationarity; 

wad indicates the VAD decision before hangover is added; 

vad is the final VAD decision with hangover included. 
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5.2.1 Adaptive filtering and energy computation 



Pvad is computed as follows: 



Pvad = rvado acfo + 2 ^ rvadi acfi 



(=1 

This corresponds to performing an 8th order block filtering on the input samples to the speech encoder, after zero offset 
compensation and pre-emphasis. This is explained in clause A.l. 



5.2.2 /ACF averaging 



Spectral characteristics of the input signal have to be obtained using blocks that are larger than one 20 ms frame. This is 
done by averaging the auto-correlation values for several consecutive frames. This averaging is given by the following 
equations: 

frames— \ 

avO{n}, = ^acf{n-j}^ ;z=0..8 



flvl{n}; = flvO{n — frames] 



;i = 0..8 



Where n represents the current frame, n-1 represents the previous frame etc. The values of constants are given in 
table 2.1. 

Table 2.1 : Constants and variables for ACF averaging 



Constant 


Value 


Variable 


Initial value 


frames 


4 


previous ACF's 
avO & av1 


All set to 



5.2.3 Predictor values computation 

The filter predictor values aavl are obtained from the auto-correlation values avl according to the equation: 
a = R~^p 
where: 



|av1[0], 
|av1[1] 
|av1[2] 
|av1[3] 
|av1[4] 
|av1[5] 
|av1[6] 
|av1[7] 



av1[1], 
av1 [0] 
av1[1] 
av1 [2] 
av1 [3] 
av1 [4] 
av1 [5] 
av1 [6] 



av1[2], 
, av1[1] 
, av1[0] 
, av1[1] 
, av1[2] 
, av1[3] 
, av1[4] 
, av1[5] 



av1[3], 
av1 [2] 
av1[1] 
av1 [0] 
av1[1] 
av1 [2] 
av1 [3] 
av1 [4] 



av1[4], 
, av1[3] 
, av1 [2] 
, av1[1] 
, av1 [0] 
, av1[1] 
, av1 [2] 
, av1 [3] 



av1[5], 
av1 [4], 
av1[3] 
av1[2], 
av1[1], 
av1[0], 
av1[1], 
av1[2]. 



av1 [6] 
av1 [5] 
av1 [4] 
av1 [3] 
av1 [2] 
av1[1 
av1 [0] 
av1[1 



av1[7]| 
av1 [6] 
av1 [5] 
av1 [4] 

, av1 [3] 
av1 [2] 

, av1[1] 
av1 [0] 



and: 
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|av1[1]| 
|av1[2] 
|av1[3] 
|av1[4] 
|av1[5] 
|av1[6] 
|av1[7] 
|av1 [8] 



|aav1[1]| 
|aav1 [2] 
|aav1 [3] 
|aav1 [4] 
|aav1 [5] 
|aav1 [6] 
|aav1 [7] 
|aav1 [8] 



aavl[0] = -l 
avl is used in preference to avO as avO may contain speech. 
The autocorrelated predictor values ravl are then obtained: 



ravl; 



aavlh flflvl 



k+i 



;/ = 0..8 



/t=0 



5.2.4 Spectral comparison 



The spectra represented by the autocorrelated predictor values ravl and the averaged auto-correlation values avO are 
compared using the distortion measure dm defined below. This measure is used to produce a Boolean value stat every 
20 ms, as given by these equations: 



ravloflvOo +2/^ ravl; avO,- 



dm- 



1=1 



avOg 
difference = Idm - lastdml 



lastdm = dm 
Stat = difference < thresh 
The values of constants and initial values are given in table 2.2. 

Table 2.2: Constants and variables for spectral comparison 



Constant 


Value 


Variable 


Initial value 


thresh 


0.05 


lastdm 






5.2.5 Periodicity detection 

The frequency spectrum of mobile noise is relatively stationary over quite long periods. The Inverse Filter 
Autocorrelated Predictor coefficients of the adaptive filter rvad are only updated when this stationarity is detected. 
Vowel sounds however, also have this stationarity, but can be excluded by detecting the periodicity of these sounds 
using the long term predictor lag values (Nj) which are obtained every sub-segment from the speech codec defined in 
GSM 06.10. Consecutive lag values are compared. Cases in which one lag value is a factor of the other are catered for, 
however cases in which both lag values have a common factor, are not. This case is not important for speech input but 
this method of periodicity detection may fail for some sine waves. The Boolean variable ptch is updated every 20 ms 
and is true when periodicity is detected. It is calculated according to the following equation: 

ptch = oldlagcount + veryoldlagcount >= nthresh 

The following operations are done after the VAD decision and when the current LTP lag values (NO .. N3) are 
available, this reduces the delay of the VAD decision. (N{-1 } = N3 of previous segment.) 

lagcount = 
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for j = to 3 do 

begin 

smallag = maximum (N j, N{ j-1 } ) mod minimum (Nj, N{ j-1 } ) 
if minimum (smallag, minimum (Nj, N{ j-1 }) -smallag) < Ithresh 
then increment (lagcount ) 

end 

veryoldlagcount = oldlagcount 
oldlagcount = lagcount 

The values of constants and initial values are given in table 2.. 

Table 2.3: Constants and variables for periodicity detection 



Constant 


Value 


Variable 


Initial value 


Ithresh 
nthresh 


2 

4 


oldlagcount 

veryoldlagcount 

N3 




40 



5.2.6 Information tone detection 

The tone flag is only evaluated in the downlink VAD. In the uplink VAD, tone detection is not performed and tone = 
false. 

Computation of the tone flag is complex. It is therefore evaluated after the processing of the current speech encoder 
frame. In this way transmission of the speech or SID frame is not delayed. 

Information tones and environmental noise can be classified by inspecting the short term prediction gain, information 
tones resulting in higher prediction gains than environmental noise. Tones can therefore be detected by comparing the 
prediction gain to a fixed threshold. By limiting the prediction gain calculation to a fourth order analysis, information 
signals consisting of one or two tones can be detected whilst minimizing the prediction gain for environmental noise. 

The prediction gain decision is implemented by comparing the normalized prediction error with a threshold. This 
measure is used to evaluate the Boolean variable tone every 20 ms. The signal is classified as a tone if the prediction 
error is smaller than the threshold predth. This is equivalent to a prediction gain threshold of 13,5 dB. 

Mobile noise can contain very strong resonances at low frequencies, resulting in a high prediction gain. A further test is 
therefore made to determine the pole frequency of a second order analysis of the signal frame. The signal is classified as 
noise if the frequency of the pole is less than 385 Hz. The pole frequency calculation is described in clause A. 4. 

The algorithm for detecting information tones is as follows: 

tone = false 

den = a[l] *a[l] 

num = 4*a[2] - a[l] *a[l] 

if ( num <= ) 

return 

if (( a[l] < ) AND ( num / den < freqth )) 
return 

4 
prederr = MULT (1 - RC[i]*RC[i]) 
i = l 

if (prederr < predth) 
tone = true 



The values of the constants are given in table 2.4. The coefficients a[1..2] are transversal filter coefficients calculated 
fromrc[1..2]. The calculation of the reflection coefficients rc[1..4] is described below. 

The offset compensated signal frame sof[0..159] is multiplied by the Banning window to give the windowed frame 
sofh[0..159]: 



sofhi = sofi hanrii 



i = 0..159 
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where 



hamif = 0.5| 1 — cos 



{KM 



/ = 0..159 



The auto-correlation acfh[0..4] of the windowed signal frame is then calculated; 
159 



■cflij^ = y sofhi sojhi_]f- \k = Q..A 



i=k 
rc[1..4] are then calculated from acfli[0..4] using the Schur recursion described in the RPE-LTP codec. 

Table 2.4: Constants for information tone detection 



Constant 


Value 


freqth 
predth 


0,0973 
0,0158 



NOTE: Reflection coefficients are available in the RPE-LTP codec. However, they are calculated after 
pre-emphasis using a rectangular window and do not give good tone detection results. 



5.2.7 Threshold adaptation 

A check is made every 20 ms to determine whether the VAD decision threshold (thvad) should be changed. This 
adaptation is carried out according to the flowchart shown in figure 2.2. The constants used are given in table 2.5. 

Adaptation takes place in two different situations: firstly whenever ACF[0] is very low and secondly whenever there is 
a very high probability that speech and information tones are not present. 

In the first case, the threshold is adapted if the energy of the input signal is less than pth. The threshold is set to plev 
without carrying out any further tests because at these very low levels the effect of the signal quantization makes it 
impossible to obtain reliable results from these tests. 

In the second case, the decision threshold (thvad) and the adaptive filter coefficients (rvad) are only updated with the 
ravl values when there is a very high probability that speech and information tones are not present. Adaptation occurs if 
the following conditions are met over a number (adp) of signal frames: 

stationarity is detected in the frequency domain; 

the signal does not contain a periodic component; 

information tones are not present. 

The step-size by which the threshold is adapted is not constant but a proportion of the current value (determined by 
constants dec and inc). The adaptation begins by experimentally multiplying the threshold by a factor of (1-1/dec). If 
the new threshold is now higher than or equal to Pvad times fac then the threshold needed to be decreased and it is left 
at this new lower level. If, on the other hand, the new threshold level is less than Pvad times fac then the threshold 
either needed to be increased or kept constant. In this case it is set to Pvad times fac unless this would mean multiplying 
it by more than a factor of (l-nl/inc) (in which case it is multiplied by a factor of (l-nl/inc)). The threshold is never 
allowed to be greater than Pvad-nmargin. 
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Table 2.5: Constants and variables for threshold adaptation 



Constant 


Value 


Variable 


Initial value 


pth 


300 000 


adaptcount 





plev 


800 000 


thvad 


1 000 000 


fac 


3.0 


rvad[0] 


6 


adp 


8 


rvad[1] 


-4 


inc 


16 


rvad[2] 


1 


dec 


32 


rvad[3] to 




margin 


80 000 000 


rvad[8] 


Alio 



£75/ 



3GPP TS 46.032 version 6.0.0 Release 6 



14 



ETSI TS 146 032 V6.0.0 (2004-12) 




increment ^ 
adaptcount [^ 



t^ad = t^ad- t^ad/d^ 




X ^ND 



?> ^^^^^^ad = "^" (^^ad + %ad ' '"^ ' Pyad *fac) 



^W^ad+"^9'" 



yes 




> 

-J- 



no 



r I = 



vad - ''avi 



adaptcxiunt = adp + 1 



END 



Figure 2.2: Flow diagram for threshold adaptation 
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5.2.8 VAD decision 

Prior to hangover the VAD decision condition is: 

wad = pvad > thvad 

5.2.9 VAD Inangover addition 

VAD hangover is only added to bursts of speech greater than or equal to burstconst blocks. The Boolean variable vad 
indicates the decision of the VAD with hangover included. The values of the constants are given in table 2.6. The 
hangover algorithm is as follows: 

if wad then increment (burstcount ) else burstcount = 

if burstcount >= burstconst then 
begin 

hangcount = hangconst; 

burstcount = burstconst 
end 

vad = wad or (hangcount >= 0) 

if hangcount >= then decrement (hangcount ) 

Table 2.6: Constants and variables for VAD hangover addition 



Constant 


Value 


Variable 


Initial value 


burstconst 
hangconst 


3 
5 


burstcount 
hangcount 



-1 



Computational details 



In the next paragraphs, the detailed description of the VAD algorithm follows the preceding high level description. This 
detailed description is divided in ten clauses related to the blocks of figure 2. 1 (except periodicity updating) in the high 
level description of the VAD algorithm. 

Those clauses are: 

1) adaptive filtering and energy computation; 

2) ACF averaging; 

3) predictor values computation; 

4) spectral comparison; 

5) periodicity detection; 

6) threshold adaptation; 

7) VAD decision; 

8) VAD hangover addition; 

9) periodicity updating; 

10) information tone detection. 

The VAD algorithm takes as input the following variables of the RPE-LTP encoder (see the detailed description of the 
RPE-LTP encoder GSM 06.10): 

- L_ACF[0..8], auto-correlation function (GSM 06. 10/4.2.4); 

- scalauto, scaling factor to compute the L_ACF[0..8] (GSM 06.10/4.2.4); 
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- Nc, LTP lag (one for each sub-segment, GSM 06.10/4.2. 1 1); 

sof, offset compensated signal frame (GSM 06.10/4.2.2). 

So four Nc values are needed for the VAD algorithm. 

The VAD computation can start as soon as the L_ACF[0..8] and scalauto variables are known. This means that the 
VAD computation can take place after part 4.2.4 of GSM 06. 10 (Auto-correlation) of the LPC analysis clause of the 
RPE-LTP encoder. This scheme will reduce the delay to yield the VAD information. The periodicity updating (included 
in subclause 2.2.5) and information tone detection, are done after the processing of the current speech encoder frame. 

All the arithmetic operations and names of the variables follow the RPE-LTP detailed description. To increase the 
precision within the fixed point implementation, a pseudo-floating point representation of some variables is used. This 
stands for the following variables (and related constants) of the VAD algorithm: 

pvad: Energy of filtered signal; 

thvad: Threshold of the VAD decision; 

acfO: Energy of input signal. 

For the representation of these variables, two integers (16 bits) are needed: 

one for the exponent (e_pvad, e_thvad, e_acfO); 

one for the mantissa (m_pvad, m_thvad, m_acfO). 

The value e_pvad represents the lowest power of 2 just greater or equal to the actual value of pvad and the m_pvad 
value represents a integer which is always greater or equal to 16384 (normalized mantissa). It means that the pvad value 
is equal to: 

e pvad 
pvad = 2 -^ * (m_pvad 732768) 

This scheme guarantees a large dynamic range for the pvad value and always keeps a precision of 16 bits. All the 
comparisons are easy to make by comparing the exponents of two variables and the VAD algorithm needs only one 
pseudo-floating point addition. All the computations related to the pseudo-floating point variables require very simple 
16 or 32 bits arithmetic operations defined in the detailed description of the RPE-LTP encoder. This pseudo-floating 
point arithmetic is only used in subclauses 3.1 and 3.6. 

Table 3.1 gives a list of all the variables of the VAD algorithm that must be initialized in the reset procedure and kept in 
memory for processing the subsequent frame of the RPE- LTP encoder. The types (16 or 32 bits) and initial values of 
all these variables are clearly indicated and their related subclause is also mentioned. The bit exact implementation uses 
other temporary variables that are introduced in the detailed description whenever it is needed. 
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Table 3.1 : Initial values for variables to be stored in memory 



Names of variables: 


type (# of bits): 


Initialization: 


Subclause: 


Adaptive filter coefficients: 
rvad[0] 
rvad[1] 
rvad[2] 
rvad[3..8] 


16 
16 
16 
16 


24 576 

-16 384 

4 096 




3.1,3.6 
3.1,3.6 
3.1,3.6 
3.1,3.6 


Scaling factor of ravd[0..8]: 
normrvad 


16 


7 


3.1,3.6 


Delay line of the auto-correlation coefficients: 
L sacf[0..26] 32 
L_sav0[0..35] 32 







3.2 

3.2 


Pointers on the delay lines: 
pt_sacf 
pt_savO 


16 
16 






3.2 
3.2 


Distance measure: 
LJastdm 


32 





3.4 


Periodicity counters: 
oldlagcount 
veryoldlagcount 


16 
16 







3.5, 3.9 
3.5, 3.9 


Adaptive threshold: 
e_thvad (exponent) 
m_thvad (mantissa) 


16 
16 


20 

31 250 


3.6 

3.6 


Counter for adaptation: 
adaptcount 


16 





3.6 


Hangover flags: 
burstcount 
hangcount 


16 
16 



-1 


3.8 
3.8 


LIP lag memory: 
oldlag 


16 


40 


3.9 


Tone Detection 
tone 


16 





3.10 



6.1 Adaptive filtering and energy computation 

This subclause computes the e_pvad and m_pvad variables which represent the pvad value. It needs the L_ACF[0..8] 
and scalauto variables of the RPE-LTP algorithm and the rvad[0..8] and normrvad variables produced by subclause 3.6 
of the VAD algorithm. It also computes a floating point representation of L_ACF[0] (e_acfO and m_acfO) used in 
subclause 3.6. 

Test if L_ACF[0] is equal to 0: 

IF ( scalauto < ) THEN scalvad = 0; 

ELSE scalvad = scalauto; / keep scalvad for use in subclause 3.2 / 



IF ( L_ACF[0] 



e_pvad = -327 68; 

m_pvad = ; 

e_acfO = -32768; 

m_acfO = 0; 

EXIT /continue with subclause 3.2/ 



Re-normalization of the L_ACF[0..8]: 



normacf = norm( L_ACF[0] ); 

I FOR i = to 8: 

I sacf[i] = ( L_ACF[i] << normacf ) >> 19; 

I NEXT i: 
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Computation of e_acfO and m_acfO: 

e_acfO = add( 32, (scalvad << 1 ) ) ; 
e_acfO = sub ( e_acfO, normacf ) ; 
m_acfO = sacf[0] << 3; 

Computation of e_pvad and m_pvad: 

e_pvad = add ( e_acfO, 14 ); 
e_pvad = sub ( e_pvad, normrvad ) ; 

L_t emp = ; 

I FOR i = 1 to 8: 

I L_temp = L_add ( L_temp, L_mult ( sacf[i], rvad[i] ) ) ; 

I NEXT i: 

L_temp = L_add ( L_temp, L_mult ( sacf[0], rvad[0] ) >> 1 ) ; 

IF ( L_temp <= ) THEN L_temp = 1; 

normprod = norm ( L_temp ) ; 
e_pvad = sub ( e_pvad, normprod ) ; 
m_pvad = ( L_temp << normprod ) >> 16; 

6.2 ACF averaging 

This subclause uses the L_ACF[0..8] and the scalvad variables to compute the array L_avO[0..8] and L_avl[0..8] used 
in subclause 3.3 and 3.4. 

Computation of the scaling factor: 

seal = sub ( 10, (scalvad << 1) ) ; 

Computation of the arrays L_av0[0..8] and L_avl[0..8]: 

FOR i = to 8: 

L_temp = L_ACF[i] >> seal; 

L_avO[i] = L_add ( L_saef[i], L_temp ); 

L_avO[i] = L_add( L_saef[i+9], L_avO[i] ); 

L_avO[i] = L_add( L_saef [i+18] , L_avO[i] ); 

L_saef[ pt„saef + i ] = L_temp; 

L_avl[i] = L_savO [ pt_savO + i ]; 

L_savO [ pt_savO + i] = L_avO[i]; 
NEXT i: 

Update of the array pointers: 

IF ( pt^saef == 18 ) THEN pt_saef = 0; 
ELSE pt„saef = add ( pt_saef, 9); 

IF ( pt„savO == 27 ) THEN pt„savO = 0; 
ELSE pt_savO = add ( pt_savO, 9); 

6.3 Predictor values computation 

This subclause computes the array ravl[0..8] needed for the spectral comparison and the threshold adaptation. It uses 
the L_avl[0..8] computed in subclause 3.2, and is divided in the three following subclauses: 

Schur recursion to compute reflection coefficients. 

- Step up procedure to obtain the aavl [0..8]. 

- Computation of the ravl[0.. 8]. 
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6.3.1 Schur recursion to compute reflection coefficients 

This subclause is identical to the one used in the RPE-LTP algorithm. The array vpar[1..8] is computed with the array 
L_avl[0..8] as an input. 

Schur recursion with 16 bits arithmetic: 

IF ( L_avl[0] == ) THEN 

1== FOR i = 1 to 8: 

I vpar[i] = 0; 

|== NEXT i: 

I EXIT; /continue with subclause 3.3.2/ 
temp = norm ( L_avl[0] ); 
|== FOR k=0 to 8: 

I sacf[k] = ( L_avl[k] << temp ) >> 16; 
|== NEXT k: 

Initialize array P[..] and K[..] for the recursion: 

|== FOR 1=1 to 7: 

I K[9-l] = sacf [1] ; 

|== NEXT 1: 

|== FOR 1=0 to 8: 
I P[i] = sacf [1] ; 
|== NEXT 1: 



Compute reflection coefficients: 



FOR n=l to 8: 
IF( P[0] < abs ( P[l] ) ) THEN 

== FOR 1 = n to 8: 

vpar[l] = 0; 
== NEXT 1: 
EXIT; /continue with 

subclause 3.3.2/ 
vpar[n] = dlv ( abs ( P[l] ), P[0] ); 
IF ( P[l] > ) THEN vpar[nl = sub ( 0, vpar[n] ); 
IF ( n == 8 ) THEN EXIT; /continue with subclause 3.3.2/ 

Schur recursion; 

P[0] = add( P[0], mult_r( P[l], vpar[n] ) ); 
=== FOR m=l to 8-n: 

P [m] = add( P[m+1], mult_r ( K[9-m], vpar[n] ) ); 

K[9-m] = add( K[9-m], mult_r ( P[m+1], vpar[n] ) ); 
=== NEXT m: 

= NEXT n: 



6.3.2 Step-up procedure to obtain tiie aav1 [0..8] 

Initialization of the step-up recursion: 



L_coef[0] = 16384 << 15; 
L_coef[l] = vpar[l] << 14; 

Loop on the LPC analysis order: 



FOR m = 2 to 8: 
= FOR 1 = 1 to m-1: 

temp = L_coef [m-i] >> 16; / takes the msb / 
L_work[i] = L_add ( L_coef[i], L_mult ( vpar [m] , temp 
= NEXT 1 

= FOR 1 = 1 to m-1: 

L_coef[i] = L_work[i]; 
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|== NEXT i 

1 = 

1= L_coef[m] = vpar [m] << 14; 

1= NEXT m: 



Keep the aavl[0..8] on 13 bits for next clause: 



I FOR i = to 8: 

I aavl[i] = L_coef[i] » 19; 

I NEXT i: 



6.3.3 Computation of the rav1 [0..8] 

FOR i= to 8: 

L_work [i] = 0; 
= FOR k = to 8-i: 

L_work[i] = L_aclcl ( L_work[i], L_mult ( aavl [k] , aavl[k+i] ) ); 
= NEXT k: 

NEXT i: 

IF ( L_work[0] == ) THEN normravl =0; 
ELSE normravl = norm( L_work[0] ); 

1= FOR i= to 8: 

1= ravl[i] = ( L_work[i] << normravl ) >> 16; 

1= NEXT i: 

Keep the normravl for use in subclause 3.4 and 3.6. 

6.4 Spectral comparison 

This subclause computes the variable stat needed for the threshold adaptation. It uses the array L_avO[0..8] computed in 
subclause 3.2 and the array ravl[0..8] computed in subclause 3.3.3. 

Re-normalize L_av0[0..8]: 



IF ( L_avO[0] == ) THEN 

I FOR i = to 8: 

I savO [i] = 4095; 

I NEXT i: 

I shift = norm( L_avO[0] ); 

1= FOR i = to 8: 

1= savO[i] = ( L_avO[i] << shift-3 ) >> 16; 

1= NEXT i: 



ELSE 



Compute partial E of dm: 



L_ Z p = ; 

1= FOR i = 1 to 8: 

1= L_ Z p = L_add( L_ E p, L_mult ( ravl[i], savO[i] ) ); 
1= NEXT i: 

Compute the division of partial E by savO[0]: 



IF ( L_ E p < ) THEN L_temp = L_sub ( 0, L_ E p ); 
ELSE L_temp = L_ E p; 

IF ( L_temp == ) THEN 



ELSE 



I L_dm = 0; 
I shift = 0; 

I savO [0] = savO [0] << 3; 
I shift = norm( L_temp ); 
I temp = ( L_temp << shift ) >> 16; 
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IF ( savO[0] >= temp ) THEN 



I divshift = 0; 

I temp = div ( temp, savO[0] ), 



ELSE 



I divshift = 1; 

I temp = sub ( temp, savO[0] ), 

I temp = div ( temp, savO[0] ), 



IF ( divshift == 
ELSE L_dm = 0; 



1 ) THEN L_dm 



32768; 



L_dm = L_add ( L_dm, temp) << 1; 
I IF ( L_ Z p < ) THEN L_dm = L_sub ( 

Re-normalization and final computation of L_dm: 



L_dm) , 



L_dm = ( L_dm << 14 ) ; 

L_dm = L_dm >> shift; 

L_dm = L_add( L_dm, ( ravl[0] << 11 ) ), 

L_dm = L_dm >> normravl; 

Compute the difference and save L_dm: 



L_temp = L_sub ( L_dm, L_lastdm ) ; 

L_lastdm = L_dm; 

IF ( L_temp < ) THEN L_temp = L_sub ( 0, L_temp 

L_temp = L_sub ( L_temp, 3277 ) ; 



Evaluation of the stat flag: 



IF ( L_temp < ) THEN stat = 1; 
ELSE stat = 0; 



6.5 Periodicity detection 

This subclause just sets the ptch flag needed for the threshold adaptation. 

temp = add ( oldlagcount, veryoldlagcount ); 
IF ( temp >= 4 ) THEN ptch = 1; 
ELSE ptch = 0; 

6.6 Threshold adaptation 

This subclause uses the variables e_pvad, m_pvad, e_acfO and m_acfO computed in subclause 3.1. It also uses the flags 
stat (see subclause 3.4) and ptch (see subclause 3.5). It follows the flowchart represented on figure 2.2. 

Some constants, represented by a floating point format, are needed and a symbolic name (in capital letter) for their 
exponent and mantissa is used; table 3.2 lists all these constants with the symbolic names associated and their numerical 
constant values. 

Table 3.2: List of constants 



Constant 


Exponent 


Mantissa 


pth 

margin 

plev 


E PTH = 19 
E MARGIN = 27 
E PLEV = 20 


M PTH = 1 8 750 
M MARGIN = 19 531 
M PLEV = 25 000 



NOTE: 



Floating point representation of constants used in subclause 3.6; 

pth = 2(E_PTH)x(M_PTH/32768). 

margin = 2(E_MARGIN)x(M_MARGIN/32768). 

plev = 2(E_PLEV)x(M_PLEV/32768). 



Test if acfO < pth; if yes set thvad to plev: 
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comp = ; 

IF ( e_acfO < E_PTH ) THEN comp = 1; 

IF ( e_acfO == E_PTH ) THEN IF ( m_acfO < M_PTH ) THEN comp =1; 

IF ( comp == 1 ) THEN 

I e_thvad = E_PLEV; 

I m_thvad = M_PLEV; 

I EXIT; /continue with subclause 3.7/ 



Test if an adaptation is needed: 



comp = 0; 

IF ( ptch == 1 ) THEN comp = 1; 
IF ( Stat == ) THEN comp = 1; 
IF ( tone == 1 ) THEN comp = 1; 
IF ( comp == 1 ) THEN 

I adaptcount = 0; 

I EXIT; /continue with subclause 3.7/ 



Incrementation of adaptcount: 



adaptcount = add ( adaptcount, 1 ); 

IF ( adaptcount <= 8 ) THEN EXIT; /continue with subclause 3.7/ 



Computation of thvad-(thvad/dec): 



m_thvad = sub ( m_thvad, (m_thvad >> 5 ) ) ; 
IF ( m_thvad < 16384) THEN 

I m_thvad = m_thvad << 1; 

I e_thvad = sub ( e_thvad, 1 ) , 



Computation of pvad*fac: 



L_temp = L_add ( m_pvad, m_pvad ) ; 
L_temp = L_add ( L_temp, m_pvad ) ; 
L_t emp = L„t emp > > 1 ; 
e_temp = add ( e_pvad, 1 ); 
IF ( L_temp > 32767 ) THEN 

I L_temp = L_temp >> 1; 

I e_temp = add ( e_temp, 1 
m_t emp = L_t emp ; 



Test if thvad < pvad*fac: 



comp = 0; 

IF ( e_thvad < e_temp) THEN comp = 1; 

IF (e_thvad == e_temp) THEN IF (m_thvad < m_temp) THEN comp =1; 

Computation of minimum (thvad+(thvad/inc), pvad*fac) if comp = 1: 



IF ( comp == 1 ) THEN 

Compute thvad +(thvad/inc) . 
L_temp = L_add ( m_thvad, (m_thvad >> 4 ) ); 
IF ( L_temp > 32767 ) THEN 

I m_thvad = L_temp >> 1; 
I e_thvad = add ( e_thvad, 1 ) ; 
ELSE m_thvad = L_temp; 
comp2 = 0; 
IF ( e_temp < e_thvad) THEN comp2 = 1; 

IF (e_temp == e hvad) THEN IF (m_temp<m_thvad) THEN comp2 = 1; 

IF ( comp2 == 1 ) THEN 

e_thvad = e_temp; 
I m_thvad = m_temp; 

Computation of pvad + margin: 

IF ( e_pvad == E_MARGIN ) THEN 

L_temp = L_add (m_pvad, M_MARGIN) 
I m_temp = L_temp >> 1; 
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ELSE 



I e_temp = add ( e_pvad, 1 ) ; 

IF ( e_pvad > E_MARGIN ) THEN 

temp = sub ( e_pvad, E_MARGIN ); 
temp = M_MARGIN >> temp; 
L_temp = L_add ( m_pvad, temp ) ; 
IF ( L_temp > 32767) THEN 

I e_temp = add ( e_pvad, 1 ), 
I m_t emp = L_t emp > > 1 ; 
ELSE 

I e_temp = e_pvad; 
I m_temp = L_temp; 



ELSE 



temp = sub ( E_MARGIN, e_pvad ) ; 
temp = m_pvad >> temp; 
L_temp = L_add( M_MARGIN, temp ); 
IF (L_temp > 32767) THEN 

I e_temp = add ( E_MARGIN, 1), 
I m_temp = L_temp >> 1; 
ELSE 

I e_temp = E_MARGIN; 
I m_temp = L_temp; 



Test if thvad > pvad + margin: 



comp = 0; 

IF ( e_thvad > e_temp) THEN comp = 1; 

IF (e„thvad == e__temp) THEN IF (m_thvad > m_temp) THEN comp =1; 



IF ( comp 



1 ) THEN 



I e_thvad = e_temp; 
I m_thvad = m_temp; 



Initialize new rvad[0..8] in memory: 

normrvad = normravl; 

1= FOR i = to 8: 
1= rvad[i] = ravl[i]; 
1= NEXT i: 

Set adaptcount to adp + 1: 

adaptcount = 9; 

6.7 VAD decision 

This subclause only outputs the result of the comparison between pvad and thvad using the pseudo-floating point 
representation of thvad and pvad. The values e_pvad and m_pvad are computed in subclause 3.1 and the values e_thvad 
and m_thvad are computed in subclause 3.6. 

wad = 0; 

IF (e_pvad > ejhvad) THEN wad = 1; 

IF (e_pvad == e„thvad) THEN IF (m_pvad > m_thvad) THEN wad =1; 



6.8 VAD hangover addition 

This subclause finally sets the vad decision for the current frame to be processed. 

IF ( wad == 1 ) THEN burstcount = add ( burstcount, 1 ); 
ELSE burstcount = 0; 



IF ( burstcount >= 3 ) THEN 



1 hangcount = 5; 
I burstcount = 3; 



vad = wad; 

IF ( hangcount >= ) THEN 



I vad = 1; 
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I hangcount = sub ( hangcount, 1 ); 



6.9 Periodicity updating 



This subclause must be delayed until the LTP lags are computed by the RPE-LTP algorithm. The LTP lags called Nc in 
the speech encoder are renamed lags[0..3] (index for the first sub- segment of the frame, 1 for the second and so on). 

Loop on sub-segments for the frame: 

lagcount =0; 

FOR i = to 3: 

Search the maximum and minimum of consecutive lags. 
IF ( oldlag > lags[i] ) THEN 

I minlag = lags[i]; 
I maxlag = oldlag; 
ELSE 

I minlag = oldlag; 
I maxlag = lags[i] ; 

Compute smallag (modulo operation not defined ): 

smallag = maxlag; 
= I FOR j = to 2: 

= I IF (smallag >= minlag) THEN smallag =sub ( smallag^ minlag); 
= I NEXT j; 

Minimum of smallag and minlag - smallag: 

temp = sub ( minlag, smallag ) ; 

IF ( temp < smallag ) THEN smallag = temp; 

IF ( smallag < 2 ) THEN lagcount = add ( lagcount, 1 ); 

Save the current LTP lag. 

oldlag = lags[i]; 

NEXT i: 

Update the veryoldlagcount and oldlagcount: 

veryoldlagcount = oldlagcount; 
oldlagcount = lagcount; 

6.10 Tone detection 

This subclause computes the tone variable needed for the threshold adaptation. Tone is only calculated for the VAD in 
the downlink. In the uplink VAD tone=0. 

To reduce delay, this subclause should be calculated after the processing of the current speech encoder frame. 



6.10.1 WincJowing 



This subclause applies a Banning window to the input frame sof[0..159] to form the output frame sofh[0..159]. The 
input frame is the current offset compensated signal frame calculated in the RPE-LTP codec. The array of constants 
hann[i] is defined in table 3.2. 

Multiply signal frame by Banning window : 

|== FOR i = to 79: 

I sofh[i] = mult_r ( sof[i], hann[i] ); 

I sofh[159-i] =mult_r( sof[159-i], hann[i] ); 

|== NEXT i; 

6.10.2 Auto-correlation 

This subclause computes the auto-correlation vector L_acfh[0..5] from the windowed input frame sofh[0..159]. The 
input frame must be scaled in order to avoid an overflow situation. This subclause is identical to the one used in the 
RPE-LTP algorithm, with the exception that only five auto-correlation values are calculated. 
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Dynamic scaling of the array sofli[0..159] : 
Search for the maximum : 



smax = 0; 



|== FOR k = to 159: 

I temp = abs ( sofh[k] ); 

I IF ( temp > smax ) THEN smax = temp; 

|== NEXT k; 

Computation of the scaUng factor : 

IF ( smax == ) THEN scalauto = 0; 

ELSE scalauto = sub ( 4, norm ( smax << 16)), 

ScaUng of the array sofh[0..159] : 



IF ( scalauto > ) THEN 



I temp = 16384 >> sub ( scalauto, 1); 
|== FOR k = to 159: 

I sofh[k] = mult_r ( sofh[k], temp) 

|== NEXT k: 



Compute the L_ACF[..] : 



FOR k=0 to 4: 
L_acfh[k] = 0; 
== FOR i=k to 159: 

L_temp = L_mult ( sofh[i], sofh[i-k] ) 
L_acfh[k] = L_aclcl ( L_acfh[k], L_temp 
== NEXT i: 
NEXT k: 



6.1 0.3 Computation of the reflection coefficients 

This subclause calculates the reflection coefficients rc[1..4] from the input array L_acfh[0..4]. This procedure is 
identical to the one in subclause 3.3.1 and the RPE-LTP codec, with the exception that only four reflection coefficients 
are calculated. 

Schur recursion with 16 bits arithmetic : 

IF ( L_acfh[0] == ) THEN 

|== FOR i = 1 to 4: 

I rc[i] = 0; 

|== NEXT i: 

I EXIT; /continue with subclause 3.10.4/ 
temp = norm ( L_acfh[0] ); 
|== FOR k=0 to 4: 

I sacf[k] = ( L_acfh[k] << temp ) » 16; 
|== NEXT k: 

Initialize array P[..] and K[..] for the recursion : 

|== FOR i=l to 3: 

I K[5-i] = sacf [i] ; 

|== NEXT i: 

|== FOR i=0 to 4: 
I P [i] = sacf [i] ; 
|== NEXT i: 

Compute reflection coefficients : 

= FOR n=l to 4: 

IF ( P [0] < abs ( P[l] ) ) THEN 

|== FOR i = n to 4: 

I rc[i] = 0; 

|== NEXT i: 

I EXIT; /continue with subclause 3.10.4/ 
rc[n] = div( abs ( P[l] ), P[0] ); 
IF ( P[l] > ) THEN rc[n] = sub ( 0, re [n] ); 
IF ( n == 4 ) THEN EXIT; /continue with subclause 3.10.4/ 
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Schur recursion : 

P[0] = add( P[0], mult„r( P[l], rc[n] ) ); 

=== FOR m=l to 4-n: 

P [m] = add( P[m+1], mult_r ( K[5-m], rc[n] ) 
K[5-m] = add( K[5-m], mult_r ( P[m+1], rc[n] 

=== NEXT m: 

= NEXT n: 



6.1 0.4 Filter coefficient calculation 

This subclause calculates the direct form filter coefficients a[1..2] from the reflection coefficients rc[1..4]. 
Step-up procedure to obtain the a[1..2]: 

temp = rc[l] >> 2; 

a[l] = add ( temp, mult_r ( re [2], temp ) ); 

a[2] = rc[21 » 2; 



6.10.5 Pole Frequency Test 



This subclause uses the direct form filter coefficients a[1..2] to determine the pole frequency of the second order LPC 
analysis. If the pole frequency is less than 385 Hz tone is set to and clause 3 terminates. 

L_den = L_mult ( a[l], a[l] ); 

L_temp = a [2] << 16; 

L_num = L_sub ( L_temp, L_den ) ; 

If pole is not complex then exit : 

IF ( L_num <= ) THEN 

I tone = 0; 

I EXIT; /clause 3 complete/ 

If pole frequency is less than 385 Hz then exit : 

IF ( a[l] < 0) THEN 

1 temp = L_den >> 16; 
I L_den = L_mult ( temp, 3189 ); 
I L_temp = L_sub ( L_num, L_den ) ; 
I IF ( L_temp < ) THEN 

I tone = 0; 

1 EXIT; /clause 3 complete/ 



6.1 0.6 Prediction gain test 



This subclause uses the reflection coefficients rc[1..4] to calculate the prediction gain. If the prediction gain is greater 
than 13,5 dB then tone is set to 1 otherwise tone is set to 0. 

Calculate normalized prediction error : 

prederr = 32767; 

= FOR 1=1 to 4 

temp = mult ( rc[i], rc[i] ); 

temp = sub ( 32767, temp) ; 

prederr = mult ( prederr, temp ) ; 
= NEXT i; 

Test if prediction error is smaller than threshold : 

temp = sub ( prederr, 1464 ); 

IF ( temp < ) THEN tone = 1; 
ELSE tone = 0; 
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Table 3.2: Values of the Manning window array hann[i] 



i 


hann 


i 


hann 


i 


hann 


i 


hann 








20 


4856 


40 


16545 


60 


28139 


1 


12 


21 


5325 


41 


17192 


61 


28581 


2 


51 


22 


5811 


42 


17838 


62 


29003 


3 


114 


23 


6314 


43 


18482 


63 


29406 


4 


204 


24 


6832 


44 


19122 


64 


29789 


5 


318 


25 


7365 


45 


19758 


65 


30151 


6 


458 


26 


7913 


46 


20389 


66 


30491 


7 


622 


27 


8473 


47 


21014 


67 


30809 


8 


811 


28 


9046 


48 


21631 


68 


31105 


9 


1025 


29 


9631 


49 


22240 


69 


31377 


10 


1282 


30 


10226 


50 


22840 


70 


31626 


11 


1523 


31 


10831 


51 


23430 


71 


31852 


12 


1807 


32 


11444 


52 


24009 


72 


32053 


13 


2114 


33 


12065 


53 


24575 


73 


32230 


14 


2444 


34 


12693 


54 


25130 


74 


32382 


15 


2795 


35 


13326 


55 


25670 


75 


32509 


16 


3167 


36 


13964 


56 


26196 


76 


32611 


17 


3560 


37 


14607 


57 


26707 


77 


32688 


18 


3972 


38 


15251 


58 


27201 


78 


32739 


19 


4405 


39 


15898 


59 


27679 


79 


32764 



7 



Digital test sequences 



This clause provides information on the digital test sequences that have been designed to help the verification of 
implementations of the Voice Activity Detector. Copies of these sequences are available (see clause A.2). 



7.1 Test configuration 



The VAD must be tested in conjunction with the speech encoder defined in GSM 06.10. The test configuration is shown 
in figure 4.1. The input signal to the speech encoder is the sop[...] signal as defined in GSM 06.10 table 5.1. The 
relevant parameters produced by the speech encoder are input to the VAD algorithm to produce the VAD output. This 
output has to be checked against some reference files. 

The file format of the encoder output parameters given in GSM 06.10 table 5.1 is extended to carry the VAD 
information. 



The VAD information is placed in the unused bit 15 (MSB) of the first encoded parameter: 



LAR ( 1 ) 



bit 15 
bit 15 



1 if VAD on 
if VAD Off 



Furthermore, in order to facilitate approval testing over the air interface, the SP flag generated by the TX DTX handler 
(see GSM 06.3 1) on the basis of the VAD flag is placed in the MSB position of the second encoded parameter; 



LAR(2): bit 15 = 1 if SP on 
bit 15 = if SP off 



The output file will also contain the SID codeword and the comfort noise parameters as described in GSM 06.12 and 
GSM 06.31. 
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8kHz clock 



13 bit PCM 



Reset 



RPE-LTP 
encoder 



260 bits 




VAD flag ^ 



1 bit 



TX 

DTX 

handler 



Figure 4.1 : VAD test configuration 



7.2 Test sequences 

The test sequences are described in detail in clause A.2. 



SID or speech 



260 bits 



COMPARISON 



SP flag 



1 bit 
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Annex A (informative): 



A.1 Simplified block filtering operation 

Consider an 8th order transversal filter with filter coefficients ao-Mg, through which a signal is being passed, the output 
of the filter being: 

8 
s'(m) = — y ais{n — i) (1) 

If we apply block filtering over 20 ms segments, then this equation becomes: 

8 
s'{n) = -^ais{n-i) ;0 <n-i< 159 ;0<n<167 (2) 

/=0 

If the energy of the filtered signal is then obtained for every 20 ms segment, the equation for this is: 

167 r 8 1 2 

Pvad = y] j - y] flr<« - [ ;0 < n - / < 159 (3) 

n=o[ i=0 \ 

We know that (see GSM 06.10, subclause 3.1.4): 

159 
acfi = V s^Sn-i ;0 < M - i < 159 ;; = 0..8 (4) 

n=0 

If equation (3) is expanded and acfo- .Clcfs are substituted for sn then we arrive at the equations: 

8 
Pvad = rQflc/o + 2y riacfi (5) 

! = 1 

Where: 

8-1 

r,=Y,^,a,^. .,.^0 8 (6) 

k=0 



A.2 Description of digital test sequences 



A.2.1 Test sequences 



The VAD algorithm uses results from the full rate speech encoder defined in GSM 06. 10. In the testing of the VAD, it 
is assumed that the relevant speech encoder functions have been verified by the test sequences defined in GSM 06.10. 

The five types of input sequences are briefly described below. 
Spectral comparison 
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The two kinds of statements of the spectral comparison algorithm (subclause 3.4), arithmetic statements and control 
statements, are tested by separate test sequences. 

Arithmetic statements: 

spec_al.* 

spec_a2.* 
Control statements 

spec_c 1 . * 

spec_c2.* 

spec_c3.* 

spec_c4. * 
Threshold adaptation 

There are two types of tests to verify the threshold adaptation described in subclause 3.6: 
adapt_il.* 
adapt_i2.* 

The initial test sequences test the acfO and VAD decision. A fault in the VAD decision will cause all the other 
sequences to fail, so it is recommended that this test is run before all other tests. 

adapt_ml.* 

adapt_m2.* 

The main test sequences will check the basic threshold adaptation mechanism. 
Periodicity detection 

pitch 1.* 
pitch2.* 

These sequences check the periodicity detection algorithm described in subclause 3.5. 
Tone detection 

The tone detector test sequences are only required for downlink VAD implementations. There are three types of test to 
verify the tone detection algorithm described in subclause 3.10. The first test sequence tests the operation of the tone 
detector by means of a frequency sweep: 

freq_sw.* 
The following test sequences test the prediction gain calculation within the tone detector: 

predl.* 

pred2.* 
The following sequences test the second order pole frequency calculation within the tone detector: 

polel.* 

pole2.* 
"Safety" and initialization 



£75/ 



3GPP TS 46.032 version 6.0.0 Release 6 31 ETSI TS 1 46 032 V6.0.0 (2004-1 2) 

safety.* 

This sequence checks that safety tests have been implemented to prevent zero values being passed to the norm function. 
It checks the functions described in the Adaptive Filtering and Energy Computation subclause (subclause 3.1), and the 
Predictor Values Computation (subclause 3.3). This sequence also checks the initialization of thvad and the rvad array. 

Real speech 

good_sp.* 

bad_sp.* 

Because the test sequences cannot be guaranteed to find every possible error, there is a small possibility that an 
implementation of the correct output for test sequences, but fail with real speech. Because of this, an extra set of 
sequences are included that consist of barely detectable speech and very clean speech. 

There are 3 different file extensions: 

*.inp; speech encoder input sequences, binary files 

*.vad: output flag of the VAD algorithm, ASCII files 

*.cod: TX DTX handler output sequences, binary files for comparison with VAD/DTX handler output. 

The *.cod files contain speech coder output information in the format described in clause 4. 

It should be noted that there is no requirement in GSM 06.12 for a bit exact implementation of the averaging procedure 
to calculate the "LAR" and "xmax" parameters in the SID frames. Different implementations are allowed. 

The algorithms used for the calculation of the LAR and xmax parameters of the SID frames are therefore reproduced 
below: 

LAR averaging: 

FOR i = 1 to 8: 
L_Temp = 2; /* const, for rounding*/ 
I FOR n = 1 to 4: 

I L_Templ = LAR[j-n] (i); /*conversion 16 — > 32 bit*/ 
I L_Temp = L_Add ( L_Temp , L_Templ ); 
I NEXT n 

L_Temp = L_temp >> 2; 

mean (LAR(i)) = L_Temp; /^conversion 32 — > 16 bit*/ 
NEXT i; 

xmax averaging 

L_Temp =8; /* const, for rounding*/ 

FOR n = 1 to 4: 

I FOR i = 1 to 4: 

I L_Templ = xmax[j-n] (i); /*conversion 16 — > 32 bit*/ 

I L_Temp = L_Add ( L_Temp , L_Templ ); 

I NEXT i 
NEXT n 

L_Temp = L_Temp >> 4; 

mean (xmax) = L_Temp; /*conversion 32 — > 16 bit*/ 



A.2.2 File format description 



All the *.inp and *.cod files are written in binary using 16 bit words, while all *.vad files are written in ASCII format. 
The sizes of the files are shown in table A.2.1, A.2.2 and A.2.3. The detailed format of the *.inp and *.cod files is in 
accordance with the descriptions given in GSM 06.10 clause 5. 
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Table A.2.1 : File sizes for *.inp extension files 



File: 


Frames: 


Size in bytes: 


spec_a1.inp 


22 


7 040 


spec_a2.inp 


22 


7 040 


spec_c1.inp 


48 


15 360 


spec_c2.inp 


48 


15 360 


spec_c3.inp 


48 


15 360 


spec_c4.inp 


48 


15 360 


adaptji.inp 


67 


21 440 


adapt_i2.inp 


48 


15 360 


adapt ml.inp 


403 


128 960 


adapt m2.inp 


376 


120 320 


pitchl.inp 


35 


11 200 


pitch2.inp 


35 


11 200 


freq sw.inp 


560 


179 200 


predl.inp 


126 


40 320 


pred2.inp 


126 


40 320 


polel.inp 


97 


31 040 


pole2.inp 


42 


13 440 


safety.inp 


5 


16 00 


good_sp.inp 


312 


99 840 


bad sp.inp 


312 


99 840 



Table A.2.2: File sizes for *.cod extension files 



File: 


Frames: 


Size in bytes: 


spec_a1 .cod 


22 


3 344 


spec_a2.cod 


22 


3 344 


spec_c1 .cod 


48 


7 296 


spec_c2.cod 


48 


7 296 


spec_c3.cod 


48 


7 296 


spec_c4.cod 


48 


7 296 


adaptji.cod 


67 


10 184 


adapt_i2.cod 


48 


7 296 


adapt_m1.cod 


403 


61 256 


adapt_m2.cod 


376 


57 152 


pitch 1. cod 


35 


5 320 


pitch2.cod 


35 


5 320 


freq_sw.cod 


560 


85 120 


predl.cod 


126 


19 152 


pred2.cod 


126 


19 152 


polel.cod 


97 


14 744 


pole2.cod 


42 


6 384 


safety.cod 


5 


760 


good_sp.cod 


312 


47 424 


bad_sp.cod 


312 


47 424 



£75/ 



3GPP TS 46.032 version 6.0.0 Release 6 



33 



ETSI TS 146 032 V6.0.0 (2004-12) 



Table A.2.3: File sizes for *.vad extension files 



File: 


Frames: 


Size in bytes: 


spec al.vad 


22 


88 


spec a2.vacl 


22 


88 


spec cl.vad 


48 


192 


spec c2.vacl 


48 


192 


spec cS.vad 


48 


192 


spec c4.vad 


48 


192 


adapt il.vad 


67 


268 


adapt i2.vad 


48 


192 


adapt ml.vad 


403 


1 612 


adapt m2.vad 


376 


1504 


pitchl.vad 


35 


140 


pitch2.vad 


35 


140 


freq sw.inp 


560 


2 240 


predl.vad 


126 


504 


pred2.vad 


126 


504 


polel.vad 


97 


388 


pole2.vad 


42 


168 


safety.vad 


5 


20 


good sp.vad 


312 


1 248 


bad sp.vad 


312 


1 248 



A. 3 VAD performance 



In optimizing a VAD a difficult trade-off has to be made between speech clipping which reduces the subjective 
performance of the system, and the average activity factor. The benefit of DTX is increased as the average activity 
factor is reduced. However, in general, a reduction of the activity will be associated with a greater risk for audible 
speech clipping. 

In the optimization process, great emphasis has been placed on avoiding unnecessary speech clipping. However, it has 
been found that a VAD with virtually no audible clipping would result in a very high activity and very little DTX 
advantage. 

The VAD specified in this technical specification introduces audible and possibly objectionable clipping in certain 
cases, mainly with low input levels. However, a comprehensive evaluation programme consisting of about 600 
individual conversations conducted in a wide range of realistic conditions, it was found that about 90% of the 
conversations were free from objectionable clipping. 

The voice activity performance of the VAD is summarized in table A.3.1. The activity figures are averages of a large 
number of conversations covering factors like different talkers, noise characteristics and locations. It should be noted 
that the actual activity of a particular talker in a specific conversation may vary considerably relative to the averages 
given. This is due both to the variation in talker behaviour as well as to the level dependency of the VAD (the channel 
activity has been found to decrease by about 0,5 points of percentage per dB level reduction). However, as mentioned 
above, a decreased speech input level increases the risk of objectionable speech clipping. 

All the values given are activity figures, i.e. the % of time the radio channel has to be on. 

Table A.3.1 : Summary of channel activity 



Telephone 


Situation 


Typical channel 


instrument 




activity factor: 


Handset 


Quiet location 


55% 


Handset 


Moderate office 






noise with 


60% 




voice interference 




Handset 


Strong voice 
interference (e.g. 






airport/railway station) 


65-70% 


Handsfree/ 


Variable vehicle 




handset 


noise 


60% 
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A.4 Pole frequency calculation 



This annex describes the algorithm used to determine whether the pole frequency for a second order analysis of the 
signal frame is less than 385 Hz. 

The filter coefficients for a second order synthesis filter are calculated from the first two unquantized reflection 
coefficients rc[1..2] obtained from the speech encoder. This is done using the routine described in subclause 3.10.4. If 
the filter coefficients a[0..2] are defined such that the synthesis filter response is given by: 

H(z)=l/(a[0]+a[l]z-' + a[2]z-^ ) (1) 

Then the positions of the poles in the Z-plane are given by the solutions to the following quadratic: 

a[0]z^ +a[l]z + a[2]=0, a[0] = 1 (2) 

The positions of the poles, z, are therefore: 

z = re ±j*sqrt(im), f = -\ (3) 

where: 

re = -a[l]/2 (4) 

im=(4*a[2]-a[l]^)/4 (5) 

If im is negative then the poles lie on the real axis of the Z-plane and the signal is not a tone and the algorithm 
terminates. If re is negative then the poles lie in the left hand side of the Z-plane and the frequency is greater than 
2 000 Hz and the prediction error test can be performed. 

If im is positive and re is positive then the poles are complex and lie in the right hand side of the Z-plane and the 
frequency in Hz is related to re and im by the expression: 

freq = arctan (sqrt(im)/re ) * 4 000 / 7t (6) 

Having ensured that both im and re are positive, the test for a dominant frequency less than 385 Hz can be derived by 
substituting Equations 4 and 5 into Equation 6 and re-arranging: 

(4*a[2] - a[l]^ ) / a[l]^ < (tan(7i*385/4 000))^ (7) 

or 

(4*a[2]-a[l]^)/a[l]^< 0.0973 (8) 

If this test is true then the signal is not a tone and the algorithm terminates, otherwise the prediction error test is 
performed. 
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Annex B (normative): 
Test sequences 



The test vectors are described in the present document are suppHed in archive en_300965v080000p0.zip which 
accompanies the present document. The files contained in this archive are listed in clause A.2. 

The full rate test vectors apply to both GSM Phase 1 and Phase 2. However, the files polel.* pole2.* predl.* pred2.* 
and freq_sw.* are not required for Phase 1 (uplink and downlink) and Phase 2 uplink implementations. 
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Annex C (informative): 
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